In this document we analyse the results from the Weibull parameter estimation on the METABRIC dataset.

## here() starts at /home/n.brouwer/MEP
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ✔ purrr   1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Loading required package: lattice
## 
## Loading required package: survival
## 
## Loading required package: Formula
## 
## 
## Attaching package: 'Hmisc'
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units
## 
## 
## corrplot 0.92 loaded
## 
## 
## Attaching package: 'gridExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
## 
## 
## ========================================
## ComplexHeatmap version 2.14.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite either one:
## - Gu, Z. Complex Heatmap Visualization. iMeta 2022.
## - Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##     genomic data. Bioinformatics 2016.
## 
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
## 
## 
## ========================================
## tidyHeatmap version 1.8.1
## If you use tidyHeatmap in published research, please cite:
## 1) Mangiola et al. tidyHeatmap: an R package for modular heatmap production 
##   based on tidy principles. JOSS 2020.
## 2) Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(tidyHeatmap))
## ========================================
## 
## 
## 
## Attaching package: 'tidyHeatmap'
## 
## 
## The following object is masked from 'package:stats':
## 
##     heatmap
## 
## 
## ========================================
## circlize version 0.4.15
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================

Load data

## `summarise()` has grouped output by 'phenotype_from'. You can override using
## the `.groups` argument.
## Rows: 709 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (6): metabric_id, ERStatus, LymphNodesOrdinal, sizeOrdinal, PAM50, IntClust dbl
## (3): Grade, yearsToStatus, DeathBreast lgl (2): ERBB2_pos, isValidation
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.

Initial parameters

Prior to parameter estimation, parameters are initialized on data on the [0,100] interval using the ‘fitdist’ package. This is a quick and sloppy estimation.

Samples can be divided into four categories: 1. No initialization and no estimation possible (combination does not occur) 2. No initialization, but estimation (distances outside [0,100] interval or initialization exceeded time) 3. Initialization, but no estimation (Not possible to fit Weibull distribution) 4. No initialization and no estmation, although combination is present.

This last category requires more analyses to find out why the estimation failed.

## [1] "Number of samples possible to estimate (combination exists): 314761"
## [1] "Total number of samples from which the parameters are initialized: 264056 ( 83.89 %)"
## [1] "Total number of samples from which the parameters are estimated: 88243 ( 28.03 %)"
## [1] "Number of samples from which the parameters are estimated, but not initialized: 2323 ( 0.88 %)"
## [1] "Number of samples from which the parameters are initialized, but not estimated: 178136 ( 67.46 %)"
## [1] "Number of samples from which the parameters are not estimated and not initialized: 48382 ( 15.37 %)"

Failed parameter estimations

Which combinations could not be estimated at all?

##  [1] "CD4^{+} T cells & APCs_to_CD57^{+}"                
##  [2] "CD57^{+}_to_CD4^{+} T cells & APCs"                
##  [3] "CD57^{+}_to_CK^{+} CXCL12^{+}"                     
##  [4] "CD57^{+}_to_CK8-18^{+} ER^{hi}"                    
##  [5] "CD57^{+}_to_CK8-18^{hi}CXCL12^{hi}"                
##  [6] "CD57^{+}_to_CK8-18^{hi}ER^{lo}"                    
##  [7] "CD57^{+}_to_Ep Ki67^{+}"                           
##  [8] "CD57^{+}_to_ER^{hi}CXCL12^{+}"                     
##  [9] "CD57^{+}_to_Fibroblasts FSP1^{+}"                  
## [10] "CD57^{+}_to_HER2^{+}"                              
## [11] "CD57^{+}_to_Ki67^{+}"                              
## [12] "CD57^{+}_to_Macrophages & granulocytes"            
## [13] "CD57^{+}_to_MHC^{hi}CD15^{+}"                      
## [14] "CK^{+} CXCL12^{+}_to_MHC^{hi}CD15^{+}"             
## [15] "CK^{lo}ER^{med}_to_Granulocytes"                   
## [16] "CK8-18^{+} ER^{hi}_to_CD57^{+}"                    
## [17] "CK8-18^{+} ER^{hi}_to_Granulocytes"                
## [18] "CK8-18^{+} ER^{hi}_to_HER2^{+}"                    
## [19] "CK8-18^{+} ER^{hi}_to_MHC^{hi}CD15^{+}"            
## [20] "CK8-18^{hi}CXCL12^{hi}_to_CD57^{+}"                
## [21] "CK8-18^{hi}ER^{lo}_to_CD57^{+}"                    
## [22] "CK8-18^{hi}ER^{lo}_to_HER2^{+}"                    
## [23] "CK8-18^{hi}ER^{lo}_to_Ki67^{+}"                    
## [24] "CK8-18^{hi}ER^{lo}_to_MHC I^{hi}CD57^{+}"          
## [25] "CK8-18^{hi}ER^{lo}_to_MHC^{hi}CD15^{+}"            
## [26] "Ep CD57^{+}_to_Ki67^{+}"                           
## [27] "Ep CD57^{+}_to_Macrophages & granulocytes"         
## [28] "Ep CD57^{+}_to_MHC^{hi}CD15^{+}"                   
## [29] "ER^{hi}CXCL12^{+}_to_MHC^{hi}CD15^{+}"             
## [30] "Fibroblasts FSP1^{+}_to_CD4^{+} T cells"           
## [31] "Fibroblasts FSP1^{+}_to_CD57^{+}"                  
## [32] "Fibroblasts FSP1^{+}_to_Macrophages & granulocytes"
## [33] "Fibroblasts FSP1^{+}_to_MHC I & II^{hi}"           
## [34] "Fibroblasts FSP1^{+}_to_MHC I^{hi}CD57^{+}"        
## [35] "Fibroblasts FSP1^{+}_to_MHC^{hi}CD15^{+}"          
## [36] "Granulocytes_to_CD57^{+}"                          
## [37] "Granulocytes_to_CK^{lo}ER^{med}"                   
## [38] "Granulocytes_to_MHC I^{hi}CD57^{+}"                
## [39] "HER2^{+}_to_CD57^{+}"                              
## [40] "HER2^{+}_to_CK8-18^{hi}ER^{lo}"                    
## [41] "HER2^{+}_to_MHC I^{hi}CD57^{+}"                    
## [42] "Ki67^{+}_to_Basal"                                 
## [43] "Ki67^{+}_to_CD57^{+}"                              
## [44] "Ki67^{+}_to_Ep CD57^{+}"                           
## [45] "Macrophages & granulocytes_to_CD57^{+}"            
## [46] "Macrophages & granulocytes_to_Ep CD57^{+}"         
## [47] "Macrophages_to_CD57^{+}"                           
## [48] "MHC I^{hi}CD57^{+}_to_CK8-18^{hi}ER^{lo}"          
## [49] "MHC I^{hi}CD57^{+}_to_Ep Ki67^{+}"                 
## [50] "MHC I^{hi}CD57^{+}_to_Fibroblasts FSP1^{+}"        
## [51] "MHC I^{hi}CD57^{+}_to_Granulocytes"                
## [52] "MHC I^{hi}CD57^{+}_to_HER2^{+}"                    
## [53] "MHC I^{hi}CD57^{+}_to_MHC^{hi}CD15^{+}"            
## [54] "MHC^{hi}CD15^{+}_to_B cells"                       
## [55] "MHC^{hi}CD15^{+}_to_CD57^{+}"                      
## [56] "MHC^{hi}CD15^{+}_to_CK^{+} CXCL12^{+}"             
## [57] "MHC^{hi}CD15^{+}_to_CK8-18^{+} ER^{hi}"            
## [58] "MHC^{hi}CD15^{+}_to_ER^{hi}CXCL12^{+}"             
## [59] "MHC^{hi}CD15^{+}_to_Fibroblasts FSP1^{+}"          
## [60] "MHC^{hi}CD15^{+}_to_MHC I^{hi}CD57^{+}"            
## [61] "T_{Reg} & T_{Ex}_to_CK8-18^{hi}ER^{lo}"

Look at the slides of the samples that were not estimated, but initialized.

Look at the slides of the samples that were not initialized, but estimated.

Look at the slides of the samples that were not initialized and not estimated.

We would like to find out why our method fails in initializing and estimating certain samples. A possible explanation is that the comination exist, but is very rare.

Haakje: Zijn de samples met hoge counts in cell types met veel 0 counts, verrijkt in bepaaplde subtypes of iets dergelijks?

## Warning in matrix(not_estimated_counts %>% pull(n), byrow = T, nrow =
## length(unique(not_estimated_counts %>% : data length [1022] is not a
## sub-multiple or multiple of the number of rows [32]

## Saving 7 x 5 in image

## Saving 7 x 5 in image

There are still samples with high counts, but without estimation.

## [1] "Number of samples with phenotype_to or phenotype_from > 1000: 154"

There are also samples with high counts of both from and to cell type.

## [1] "Number of samples with both phenotype_to and from count 50+ : 9"

Estimated parameters

Do the parameters look like a C-curve?

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Removed 258 rows containing non-finite values (`stat_density2d()`).
## Warning: The following aesthetics were dropped during statistical transformation: label
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 258 rows containing missing values (`geom_point()`).

## Saving 7 x 5 in image
## Warning: Removed 258 rows containing non-finite values (`stat_density2d()`).
## Warning: The following aesthetics were dropped during statistical transformation: label
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 258 rows containing missing values (`geom_point()`).
## Warning in geom_point(data = means %>% mutate(type = "combination means"), :
## Ignoring unknown aesthetics: label
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Removed 258 rows containing missing values (`geom_point()`).

## Saving 7 x 5 in image
## Warning: Removed 258 rows containing missing values (`geom_point()`).
## Rows: 1152 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (6): sample, phenotype_combo, translational_response, Biopsy, PA_Respons...
## dbl (5): a, b, A, B, ptID
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Removed 258 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).

## Saving 7 x 5 in image
## Warning: Removed 258 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).

Look at the extremes of the curve.

## [1] "Ep Ki67^{+}_to_MHC^{hi}CD15^{+}"              
## [2] "Endothelial_to_Endothelial"                   
## [3] "CK^{+} CXCL12^{+}_to_Macrophages"             
## [4] "MHC I & II^{hi}_to_Macrophages & granulocytes"
## [5] "ER^{hi}CXCL12^{+}_to_Ep CD57^{+}"             
## [6] "CD4^{+} T cells_to_MHC^{hi}CD15^{+}"          
## [7] "Myofibroblasts_to_CK^{lo}ER^{med}"
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Removed 258 rows containing missing values (`geom_point()`).

## Warning: Removed 258 rows containing missing values (`geom_point()`).

Now look at specific ‘X to celltype’ parameters similar to the paper by Gil-Jimenez et al.

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Saving 7 x 5 in image
## Warning: Removed 3 rows containing missing values (`geom_point()`).

## Saving 7 x 5 in image
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Saving 7 x 5 in image

## Saving 7 x 5 in image
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Saving 7 x 5 in image

## Saving 7 x 5 in image
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Saving 7 x 5 in image

## Saving 7 x 5 in image
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Saving 7 x 5 in image

## Saving 7 x 5 in image
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Saving 7 x 5 in image

## Saving 7 x 5 in image

Compare the parameters of the six cell types of Alberto with our parameters. Note: We map the cell types of Alberto to our cell type classification, but both groups are not entirely the same and the cell type groups of Alberto contain more diverse cells.

## Saving 7 x 5 in image

## Saving 7 x 5 in image

## Saving 7 x 5 in image

## Saving 7 x 5 in image

## Saving 7 x 5 in image

## Saving 7 x 5 in image

Heatmap

What patterns in the parameters do we find?

## The automatically generated colors map from the minus and plus 99^th of
## the absolute values in the matrix. There are outliers in the matrix
## whose patterns might be hidden by this color mapping. You can manually
## set the color to `col` argument.
## 
## Use `suppressMessages()` to turn off this message.
## The automatically generated colors map from the minus and plus 99^th of
## the absolute values in the matrix. There are outliers in the matrix
## whose patterns might be hidden by this color mapping. You can manually
## set the color to `col` argument.
## 
## Use `suppressMessages()` to turn off this message.

By clustering samples, we identify patient subtypes with potential associations to clinical features. This enables us to measure the effectiveness of our spatial characterization. For a comparative analysis, we cluster samples on: 1. densities 2. close-far combinations 3. parameters

density_matrix <- tibble(tnumber = unique(cell_occurences$tnumber))

for (c in unique(cell_occurences$meta_description)){
  density_matrix <- left_join(x = density_matrix, y= cell_occurences %>% 
                              filter(meta_description == c),
                            by='tnumber') %>% select(-c(meta_description))
  names(density_matrix)[names(density_matrix) == 'n'] = c
}

density_hm <- generate_hm(density_matrix, 5, cluster_rows = TRUE,cluster_columns = FALSE, 2,15,'samples', 'cell type')
## The automatically generated colors map from the 1^st and 99^th of the
## values in the matrix. There are outliers in the matrix whose patterns
## might be hidden by this color mapping. You can manually set the color
## to `col` argument.
## 
## Use `suppressMessages()` to turn off this message.
if (save_files){
save_pdf(density_hm, here('output/density_heatmap.pdf'), width=20, height=12)
    }

draw(density_hm)

print(subtype_proportions(row_order(density_hm))+ ggtitle('clusters based on cell type densities'))
## Warning: The heatmap has not been initialized. You might have different results
## if you repeatedly execute this function, e.g. when row_km/column_km was
## set. It is more suggested to do as `ht = draw(ht); row_order(ht)`.

# TODO: close-far cell type estimation
# Parameters
draw_shape_hm <- draw(shape_hm)

clusters_shape <- row_order(draw_shape_hm)
draw_scale_hm <- draw(scale_hm)

clusters_scale <- row_order(draw_scale_hm)

print(subtype_proportions(clusters_shape) + ggtitle('clusters based on shape parameter'))

print(subtype_proportions(clusters_scale) + ggtitle('clusters based on scale parameter'))

UMAP

Plot the parameters as UMAP to see if there are clusters of the same cancer subtype.

library(umap)

shape_matrix <- generate_matrix(read_rds(here('scratch/shape_parameter_matrix.rds')))
scale_matrix <- generate_matrix(read_rds(here('scratch/scale_parameter_matrix.rds')))

labels <- merge(as_tibble(rownames(shape_matrix)), clinical_data %>% select(ImageNumber, PAM50), by.x='value', by.y='ImageNumber', all.x=T) %>% rename(tnumber = value) %>% mutate(tnumber = as.numeric(tnumber))

generate_UMAP <- function(matrix, labels, title){
  umap <- umap(matrix)
  umap <- umap$layout %>%
  as.data.frame()%>%
  rename(UMAP1="V1",
         UMAP2="V2") %>%
  mutate(tnumber=row_number())%>%
  inner_join(labels, by='tnumber')

  return(umap %>%
  ggplot(aes(x = UMAP1, 
             y = UMAP2, 
             color = PAM50))+
  geom_point()+ theme_bw() +
  labs(x = "UMAP1",
       y = "UMAP2",
      subtitle = title))
}

print(generate_UMAP(shape_matrix, labels, 'UMAP plot of shape parameters'))

print(generate_UMAP(scale_matrix, labels, 'UMAP plot of scale parameters'))

scratch

#Generate a correlation matrix

# parameter_matrix <- tibble(tnumber = unique(all_parameters$tnumber))
# 
# for (c in unique(all_parameters$phenotype_combo)){
#   parameter_matrix <- left_join(x = parameter_matrix, y= all_parameters %>% 
#                               select(c(tnumber, phenotype_combo, shape)) %>% 
#                               filter(phenotype_combo == c),
#                             by='tnumber') %>%
#                       select(-c(phenotype_combo))
#   names(parameter_matrix)[names(parameter_matrix) == 'shape'] = c
# }
# 
# parameter_matrix <- parameter_matrix %>% select(-c(tnumber))
# 
# correlation_matrix <- Hmisc::rcorr(as.matrix(parameter_matrix), type="pearson")
# 
# flattenCorrMatrix <- function(cormat, pmat, nmat) {
#   ut <- upper.tri(cormat)
#   data.frame(
#     row = rownames(cormat)[row(cormat)[ut]],
#     column = rownames(cormat)[col(cormat)[ut]],
#     cor  =(cormat)[ut],
#     p = pmat[ut],
#     n = nmat[ut]
#     )
# }
# 
# flattened_correlation_matrix <- flattenCorrMatrix(correlation_matrix$r, correlation_matrix$P, correlation_matrix$n) %>% drop_na()  %>%
#   filter(n > 200) %>%
#   mutate(sig_p = ifelse(p < .05, T, F), p_if_sig = ifelse(p <.05, p, NA), r_if_sig = ifelse(p <.05, cor, NA))
# 
# flattened_correlation_matrix %>% 
#  ggplot(aes(row, column, fill=cor, label=round(r_if_sig,2))) +
#  geom_tile() + 
#  labs(x = NULL, y = NULL, fill = "Pearson's\nCorrelation")+ 
#   theme(axis.text = element_text(size = 1))  + 
#  scale_fill_gradient2(mid="#FBFEF9",low="#0C6291",high="#A63446", limits=c(-1,1)) +
#  scale_x_discrete(expand=c(0,0)) + 
#  scale_y_discrete(expand=c(0,0))
# significant_correlations <- flattened_correlation_matrix %>% 
#   filter(sig_p == TRUE) %>%
#   filter(p_if_sig > 0) %>%
#   arrange(p_if_sig) 
# 
# significant_correlations %>% head()
# 
# 
# for (c in 1:5){
# point <- all_parameters %>%
#     filter(phenotype_combo == significant_correlations[c,"row"] | phenotype_combo == significant_correlations[c,"column"]) %>%
#     ggplot(aes(x=shape, y=scale, color=phenotype_combo)) +
#     geom_point(alpha=0.4, size=3) +
#     ylim(10,300) + xlim(0,6) +
#     theme_bw() + scale_y_log10() +
#     xlab('Shape') + ylab('Scale') +
#     theme(legend.position='right') + ggtitle(paste('correlated with p-value: ', signif(significant_correlations[c,"p_if_sig"],digits=3))) + 
#   guides(colour = guide_legend(override.aes = list(size=10)))
#   
#   print(point)
# }